TBMS: Domain Specific Text Management and Lexicon Development

نویسندگان

  • Sebastian Goeser
  • Erhard Mergenthaler
چکیده

The definition of a Text Base Management System is introduced in terms of software engineering. That gives a basis for discussing practical text administration , including questions on corpus properties and appropriate retrieval criteria. Finally, strategies for the derivation of a word data base from an actual TBMS will be discussed. l. Introduction Textual data are a sort of complex data object that is of growing importance in many applications. Research projects from such different fields as history, law, social sciences, humanities and linguistics but also commercial institutions are dealing with vast quanitities of text. At Ulm University for instance, a machine-readable corpus of spoken language texts has been built up, with the purpose of support For psychotherapeutic process research. The corpus is administered by a Text Base Management System (TBMS), that integrates the functions of archiving, processing and analyzing an arbitrary amount of text (MERGENTHALER 1985). Several sysLmns satisfying the TBMS definition were conceived independently in the late seventies. THALLER (1983) reports a system CLIO, a TBMS with a highly differentiated data base component and a method base providing c~nputerized content analysis and comfortable editing. LDVLIB (DREWEK and ERNI 1982) is mainly a text analysis package, where data base management and text processing play a subordinate role. A PC-suited TBM-system, ARRAS (SMITH 1984), supports comfortable text inquiry by concordance and index functions, but has no textbase component. Finally there are two TBM-systems for commercial use, MIDOC (KOWARSKI and MICHAUX 1983) and MINDOK [INFODAS 1983) which have a database component and allow extensive processing of text, but no kind of text analysis at all. 2. Definition of a TBMS From the point of view of a TBMS-user, who is supposed to be a non-programming application-field worker, the system is an instrument to take up, to control and/or to analyze his or her individual texts for domain-dependent purposes. Consequently, a system intended to manage a text bank has the following tasks: 1. Input and editing of texts according to numerous points of view. 2. Management of an unlimited number of text units on a suitably sized auxiliary storage device. 3. Management of an unlimited amount of information concerning the text units, their authors, and the related text analyses. 4. Management of an open quantity of methods for editing and analyzing stored text units. 5. Assistance for interfaces to statistical and other user packages. 6. Assistance for a simple, dialogue-oriented user interface …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Linguistic Analysis of Conference Titles in Applied Linguistics

Over the past twenty-five years, researchers have expressed considerable interest in titles of academic publications. Unfortunately, conference paper titles (CPTs) have only recently begun to receive attention. The aim of this study, therefore, is to investigate the text length, syntactic structure, and lexicon of CPTs in Applied Linguistics. A data set of 698 titles was selected from the 2008 ...

متن کامل

A Linguistic Analysis of Conference Titles in Applied Linguistics

Over the past twenty-five years, researchers have expressed considerable interest in titles of academic publications. Unfortunately, conference paper titles (CPTs) have only recently begun to receive attention. The aim of this study, therefore, is to investigate the text length, syntactic structure, and lexicon of CPTs in Applied Linguistics. A data set of 698 titles was selected from the 2008 ...

متن کامل

Development of a Conceptual Structure for a Domain-Specific Corpus

The corpus reported in this paper was developed for the evaluation of a domain-specific Text to Knowledge Mapping (TKM) prototype. The TKM prototype operates on the basis of both a combinatory categorical grammar (CCG) linguistic model and a knowledge model that consists of three layers: ontology, qualitative and quantitative layers. In the course of this evaluation it was necessary to populate...

متن کامل

Experimentation Made Easy

Scientifically sound network studies require the execution of large series of experiments. Researchers usually have to execute experiments manually, a labor-intensive and error-prone task, since there is no automation of the overall experimentation process. This task becomes especially hard on a distributed testbed and the researcher has to deal with additional challenges. In this paper we intr...

متن کامل

A Corpus-based Evaluation of a Domain-specific Text to Knowledge Mapping Prototype

The aim of this paper is to evaluate a Text to Knowledge Mapping (TKM) Prototype. The prototype is domain-specific, the purpose of which is to map instructional text onto a knowledge domain. The context of the knowledge domain is DC electrical circuit. During development, the prototype has been tested with a limited data set from the domain. The prototype reached a stage where it needs to be ev...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1986